123VCF: an intuitive and efficient tool for filtering VCF files

BMC Bioinformatics. 2024 Feb 14;25(1):68. doi: 10.1186/s12859-024-05661-5.

Abstract

Background: The advent of Next-Generation Sequencing (NGS) has catalyzed a paradigm shift in medical genetics, enabling the identification of disease-associated variants. However, the vast quantum of data produced by NGS necessitates a robust and dependable mechanism for filtering irrelevant variants. Annotation-based variant filtering, a pivotal step in this process, demands a profound understanding of the case-specific conditions and the relevant annotation instruments. To tackle this complex task, we sought to design an accessible, efficient and more importantly easy to understand variant filtering tool.

Results: Our efforts culminated in the creation of 123VCF, a tool capable of processing both compressed and uncompressed Variant Calling Format (VCF) files. Built on a Java framework, the tool employs a disk-streaming real-time filtering algorithm, allowing it to manage sizable variant files on conventional desktop computers. 123VCF filters input variants in accordance with a predefined filter sequence applied to the input variants. Users are provided the flexibility to define various filtering parameters, such as quality, coverage depth, and variant frequency within the populations. Additionally, 123VCF accommodates user-defined filters tailored to specific case requirements, affording users enhanced control over the filtering process. We evaluated the performance of 123VCF by analyzing different types of variant files and comparing its runtimes to the most similar algorithms like BCFtools filter and GATK VariantFiltration. The results indicated that 123VCF performs relatively well. The tool's intuitive interface and potential for reproducibility make it a valuable asset for both researchers and clinicians.

Conclusion: The 123VCF filtering tool provides an effective, dependable approach for filtering variants in both research and clinical settings. As an open-source tool available at https://project123vcf.sourceforge.io , it is accessible to the global scientific and clinical community, paving the way for the discovery of disease-causing variants and facilitating the advancement of personalized medicine.

Keywords: Exome sequencing; Genome sequencing; Next generation sequencing; VCF; VCF filtering; Variant analysis; Variant filtering.

MeSH terms

  • Algorithms*
  • High-Throughput Nucleotide Sequencing
  • Reproducibility of Results
  • Software*